Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Benoit, Anne; Kaplan, Haim; Wild, Sebastian; Herman, Grzegorz (Ed.)Data structures on a multiset of genomic k-mers are at the heart of many bioinformatic tools. As genomic datasets grow in scale, the efficiency of these data structures increasingly depends on how well they leverage the inherent patterns in the data. One recent and effective approach is the use of learned indexes that approximate the rank function of a multiset using a piecewise linear function with very few segments. However, theoretical worst-case analysis struggles to predict the practical performance of these indexes. We address this limitation by developing a novel measure of piecewise-linear approximability of the data, called CaPLa (Canonical Piecewise Linear approximability). CaPLa builds on the empirical observation that a power-law model often serves as a reasonable proxy for piecewise linear-approximability, while explicitly accounting for deviations from a true power-law fit. We prove basic properties of CaPLa and present an efficient algorithm to compute it. We then demonstrate that CaPLa can accurately predict space bounds for data structures on real data. Empirically, we analyze over 500 genomes through the lens of CaPLa, revealing that it varies widely across the tree of life and even within individual genomes. Finally, we study the robustness of CaPLa as a measure and the factors that make genomic k-mer multisets different from random ones.more » « less
-
Benoit, Anne; Kaplan, Haim; Wild, Sebastian; Herman, Grzegorz (Ed.)We consider algorithmic problems motivated by modular robotic reconfiguration in the sliding square model, in which we are given n square-shaped modules in a (labeled or unlabeled) start configuration and need to find a schedule of sliding moves to transform it into a desired goal configuration, maintaining connectivity of the configuration at all times. Recent work has aimed at minimizing the total number of moves, resulting in fully sequential schedules that can perform reconfiguration in 𝒪(n²) moves, or 𝒪(nP) for arrangements of bounding box perimeter size P. We provide first results in the sliding square model that exploit parallel motion, performing reconfiguration in worst-case optimal makespan of 𝒪(P). We also provide tight bounds on the complexity of the problem by showing that even deciding the possibility of reconfiguration within makespan 1 is NP-complete in the unlabeled case. In the labeled variant, we note that deciding the same for makespan 2 is NP-complete, while makespan 1 is straightforward.more » « less
-
Benoit, Anne; Kaplan, Haim; Wild, Sebastian; Herman, Grzegorz (Ed.)A covering path for a finite set P of points in the plane is a polygonal path such that every point of P lies on a segment of the path. The vertices of the path need not be at points of P. A covering path is plane if its segments do not cross each other. Let π(n) be the minimum number such that every set of n points in the plane admits a plane covering path with at most π(n) segments. We prove that π(n) ≤ ⌈6n/7⌉. This improves the previous best-known upper bound of ⌈21n/22⌉, due to Biniaz (SoCG 2023). Our proof is constructive and yields a simple O(n log n)-time algorithm for computing a plane covering path.more » « less
-
Benoit, Anne; Kaplan, Haim; Wild, Sebastian; Herman, Grzegorz (Ed.)
-
Benoit, Anne; Kaplan, Haim; Wild, Sebastian; Herman, Grzegorz (Ed.){"Abstract":["The classical rank aggregation problem seeks to combine a set X of n permutations into a single representative "consensus" permutation. In this paper, we investigate two fundamental rank aggregation tasks under the well-studied Ulam metric: computing a median permutation (which minimizes the sum of Ulam distances to X) and computing a center permutation (which minimizes the maximum Ulam distance to X) in two settings.\r\n- Continuous Setting: In the continuous setting, the median/center is allowed to be any permutation. It is known that computing a center in the Ulam metric is NP-hard and we add to this by showing that computing a median is NP-hard as well via a simple reduction from the Max-Cut problem. While this result may not be unexpected, it had remained elusive until now and confirms a speculation by Chakraborty, Das, and Krauthgamer [SODA '21].\r\n- Discrete Setting: In the discrete setting, the median/center must be a permutation from the input set. We fully resolve the fine-grained complexity of the discrete median and discrete center problems under the Ulam metric, proving that the naive Õ(n² L)-time algorithm (where L is the length of the permutation) is conditionally optimal. This resolves an open problem raised by Abboud, Bateni, Cohen-Addad, Karthik C. S., and Seddighin [APPROX '23]. Our reductions are inspired by the known fine-grained lower bounds for similarity measures, but we face and overcome several new highly technical challenges."]}more » « less
An official website of the United States government
